Skip Triton import for AMD #5110

lekurile · 2024-02-09T20:34:45Z

When testing DeepSpeed inference on an AMD Instinct MI250X/MI250 GPU, the pytorch-triton-rocm module would break the torch.cuda device API. To address this, importing triton is skipped when the GPU is determined to be AMD.

This change allows DeepSpeed to be executed on an AMD GPU w/o kernel injection in the DeepSpeedExamples text-generation example using the following command:

deepspeed --num_gpus 1 inference-test.py --model facebook/opt-125m

TODO: Root-cause the interaction between pytorch-triton-rocm and DeepSpeed to understand why this is causing the torch.cuda device API to break.

deepspeed/__init__.py

When testing DeepSpeed inference on an `AMD Instinct MI250X/MI250` GPU, the `pytorch-triton-rocm` module would break the `torch.cuda` device API. To address this, importing `triton` is skipped when the GPU is determined to be `AMD`. This change allows DeepSpeed to be executed on an AMD GPU w/o kernel injection in the DeepSpeedExamples [text-generation example](https://github.com/microsoft/DeepSpeedExamples/tree/master/inference/huggingface/text-generation) using the following command: ```bash deepspeed --num_gpus 1 inference-test.py --model facebook/opt-125m ``` TODO: Root-cause the interaction between `pytorch-triton-rocm` and DeepSpeed to understand why this is causing the `torch.cuda` device API to break.

Skip Triton import for AMD

9175b29

lekurile requested review from awan-10 and stephen-youn February 9, 2024 20:34

lekurile requested a review from mrwyattii as a code owner February 9, 2024 20:34

Fix formatting

46bac33

mrwyattii reviewed Feb 9, 2024

View reviewed changes

deepspeed/__init__.py Outdated Show resolved Hide resolved

lekurile added 2 commits February 9, 2024 21:16

Add is_rocm_device() to accelerator abstractions

c8310d5

Update how AMD is checked

94a8068

mrwyattii approved these changes Feb 9, 2024

View reviewed changes

mrwyattii merged commit d04a838 into master Feb 9, 2024
15 checks passed

mrwyattii deleted the lekurile/disable_triton_amd branch February 9, 2024 22:44

lekurile mentioned this pull request Feb 12, 2024

Add HIP device abstraction, update Triton skip logic #5120

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip Triton import for AMD #5110

Skip Triton import for AMD #5110

lekurile commented Feb 9, 2024

Skip Triton import for AMD #5110

Skip Triton import for AMD #5110

Conversation

lekurile commented Feb 9, 2024